Preparing for the Unknown: Learning a Universal Policy with Online System Identification

نویسندگان

  • Wenhao Yu
  • Jie Tan
  • C. Karen Liu
  • Greg Turk
چکیده

We present a new method of learning control policies that successfully operate under unknown dynamic models. We create such policies by leveraging a large number of training examples that are generated using a physical simulator. Our system is made of two components: a Universal Policy (UP) and a function for Online System Identification (OSI). We describe our control policy as universal because it is trained over a wide array of dynamic models. These variations in the dynamic model may include differences in mass and inertia of the robots components, variable friction coefficients, or unknown mass of an object to be manipulated. By training the Universal Policy with this variation, the control policy is prepared for a wider array of possible conditions when executed in an unknown environment. The second part of our system uses the recent state and action history of the system to predict the dynamics model parameters μ. The value of μ from the Online System Identification is then provided as input to the control policy (along with the system state). Together, UP-OSI is a robust control policy that can be used across a wide range of dynamic models, and that is also responsive to sudden changes in the environment. We have evaluated the performance of this system on a variety of tasks, including the problem of cart-pole swing-up, the double inverted pendulum, locomotion of a hopper, and block-throwing of a manipulator. UP-OSI is effective at these tasks across a wide range of dynamic models. Moreover, when tested with dynamic models outside of the training range, UP-OSI outperforms the Universal Policy alone, even when UP is given the actual value of the model dynamics. In addition to the benefits of creating more robust controllers, UP-OSI also holds out promise of narrowing the Reality Gap between simulated and real physical systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal adaptive leader-follower consensus of linear multi-agent systems: Known and unknown dynamics

In this paper, the optimal adaptive leader-follower consensus of linear continuous time multi-agent systems is considered. The error dynamics of each player depends on its neighbors’ information. Detailed analysis of online optimal leader-follower consensus under known and unknown dynamics is presented. The introduced reinforcement learning-based algorithms learn online the approximate solution...

متن کامل

An Online Q-learning Based Multi-Agent LFC for a Multi-Area Multi-Source Power System Including Distributed Energy Resources

This paper presents an online two-stage Q-learning based multi-agent (MA) controller for load frequency control (LFC) in an interconnected multi-area multi-source power system integrated with distributed energy resources (DERs). The proposed control strategy consists of two stages. The first stage is employed a PID controller which its parameters are designed using sine cosine optimization (SCO...

متن کامل

A Higher Order Online Lyapunov-Based Emotional Learning for Rough-Neural Identifiers

o enhance the performances of rough-neural networks (R-NNs) in the system identification‎, ‎on the base of emotional learning‎, ‎a new stable learning algorithm is developed for them‎. ‎This algorithm facilitates the error convergence by increasing the memory depth of R-NNs‎. ‎To this end‎, ‎an emotional signal as a linear combination of identification error and its differences is used to achie...

متن کامل

Fuzzy adaptive tracking control for a class of nonlinearly parameterized systems with unknown control directions

This paper addresses the problem of adaptive fuzzy tracking control for aclass of nonlinearly parameterized systems with unknown control directions.In this paper, the nonlinearly parameterized functions are lumped into the unknown continuous functionswhich can be approximated by using the fuzzy logic systems (FLS) in Mamdani type. Then, the Nussbaum-type function is used to de...

متن کامل

Iterative learning identification and control for dynamic systems described by NARMAX model

A new iterative learning controller is proposed for a general unknown discrete time-varying nonlinear non-affine system represented by NARMAX (Nonlinear Autoregressive Moving Average with eXogenous inputs) model. The proposed controller is composed of an iterative learning neural identifier and an iterative learning controller. Iterative learning control and iterative learning identification ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1702.02453  شماره 

صفحات  -

تاریخ انتشار 2017